feat: Maarten/aarch64 spark support by mvansegbroeck · Pull Request #268 · NVIDIA-NeMo/Safe-Synthesizer

mvansegbroeck · 2026-03-19T17:37:39Z

Summary

Add DGX Spark (aarch64) support for Safe Synthesizer:

Dockerfile: New containers/Dockerfile.cuda-aarch64 for building on aarch64/Spark
Docs: docs/DGX_SPARK.md — quickstart guide for running NSS on DGX Spark
Dependencies: Platform-specific dependency guards for aarch64 (torchvision, torchao, xformers); add missing onnxruntime and opt_einsum to cu128 extra
Training config: Auto-detect aarch64 and override incompatible defaults (flash attention, quantization)
Secret scanning: Add pragma: allowlist secret for API key placeholder in docs

Pre-Review Checklist

make format && make check or via prek validation.
make test passes locally
make test-e2e passes locally
make test-ci-container passes locally (recommended)
GPU CI status check passes -- comment /sync on this PR to trigger a run (auto-triggers on ready-for-review)

Pre-Merge Checklist

New or updated tests for any fix or new behavior
Updated documentation for new features and behaviors, including docstrings for API docs.

binaryaaron

claude was a liar about unsloth and aarch64. also, pull in this change and test again

pyproject.toml

binaryaaron

if we want to support dgx spark quickly, the different install path in the container is fine but it's not proper support. let's make a follow up for this - we'll need to do a proper spike on cross-platform support + cuda version support.

src/nemo_safe_synthesizer/generation/vllm_backend.py

containers/Dockerfile.spark

binaryaaron · 2026-03-24T06:57:48Z

pyproject.toml

  "vllm==0.15.0",
-  "xformers==v0.0.33.post2; sys_platform == 'linux'",
+  "xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
 ]


this is probably going to be a whole thing but we should make a new dep group or do better platform resolution for cross-platform deps.

Agree this needs a proper spike. For now the platform markers are minimal and don't break x86_64 right?

hmmm do we want to keep the sys_platform markers? locking is borked in CI right now and I've been fighting it by putting linux markers in

We really will need a spark and/or station as part of our github runners so we make sure this continues to work.

Let me know what to change here.

binaryaaron

approving conditionally so i don't block while i'm out

binaryaaron · 2026-03-24T14:21:27Z

We've also got to publish the container somewhere in the future, same with the cuda one.

- containers/Dockerfile.spark: container-based install using nvcr.io/nvidia/vllm:26.02-py3 - docs/DGX_SPARK.md: quick start guide (build + run in 2 steps) - pyproject.toml: platform markers for aarch64-incompatible packages (faiss-gpu-cu12, torchvision+cu128, torchao, xformers) - config/training.py: auto-fallback Flash Attention 3 to sdpa on aarch64 - vllm_backend.py: handle vllm versions without attention_config kwarg Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

onnxruntime is required by gliner but not always resolved transitively on aarch64. opt_einsum is directly imported in dp_transformers/linear.py but was never declared. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

…lockfile Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

mckornfield · 2026-04-01T02:02:25Z

containers/Dockerfile.cuda-aarch64

+
+# 2. Torch-dependent packages — --no-deps preserves the container's torch/CUDA
+RUN pip install --no-deps \
+    peft accelerate bitsandbytes datasets==4.3.0 trl==0.26.1 \


🤔 should we add an extras group for them? mostly to not have these versions be different than what's in the lockfile

--no-deps install is intentional to preserve the container's torch/CUDA

docs/DGX_SPARK.md

src/nemo_safe_synthesizer/config/training.py

mckornfield · 2026-04-01T02:06:29Z

pyproject.toml

  "vllm==0.15.0",
-  "xformers==v0.0.33.post2; sys_platform == 'linux'",
+  "xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
 ]


hmmm do we want to keep the sys_platform markers? locking is borked in CI right now and I've been fighting it by putting linux markers in

docs/developer-guide/dgx-spark.md

docs/DGX_SPARK.md

kendrickb-nvidia · 2026-04-01T17:18:06Z

pyproject.toml

  "vllm==0.15.0",
-  "xformers==v0.0.33.post2; sys_platform == 'linux'",
+  "xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
 ]


We really will need a spark and/or station as part of our github runners so we make sure this continues to work.

- Move DGX_SPARK.md to docs/developer-guide/ and add to mkdocs nav

- Make DGX Spark doc description Spark-specific - Remove SSH clone hint (repo is public) - Update docs link to github pages URL - Combine nested if into single condition in training.py

mvansegbroeck requested review from a team as code owners March 19, 2026 17:37

binaryaaron requested changes Mar 19, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 6 times, most recently from 59d13ec to 83fff62 Compare March 24, 2026 01:22

binaryaaron reviewed Mar 24, 2026

View reviewed changes

binaryaaron previously approved these changes Mar 24, 2026

View reviewed changes

mvansegbroeck dismissed binaryaaron’s stale review via c3bf79b March 24, 2026 17:04

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 5 times, most recently from a113a39 to 655ca6c Compare March 24, 2026 21:22

mvansegbroeck changed the title ~~Maarten/aarch64 spark support~~ feat: Maarten/aarch64 spark support Mar 24, 2026

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 2 times, most recently from 8408cca to 819bdb8 Compare March 26, 2026 00:05

mvansegbroeck and others added 4 commits March 31, 2026 10:03

fix: allowlist secret pragma and use Self type hint

27db97f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

chore: fix formatting and bump uv version ceiling to <0.12.0

c7f1cf7

Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch from f3b9988 to c7f1cf7 Compare March 31, 2026 17:04

fix: pass model as positional arg to satisfy type checker and update …

2688050

…lockfile Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

mckornfield reviewed Apr 1, 2026

View reviewed changes

kendrickb-nvidia reviewed Apr 1, 2026

View reviewed changes

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch from 94bf848 to 2688050 Compare April 7, 2026 16:55

mvansegbroeck added 2 commits April 7, 2026 09:57

fix: address PR review feedback for aarch64 Spark support

84a5419

- Move DGX_SPARK.md to docs/developer-guide/ and add to mkdocs nav

fix: address remaining PR review feedback

83089d1

- Make DGX Spark doc description Spark-specific - Remove SSH clone hint (repo is public) - Update docs link to github pages URL - Combine nested if into single condition in training.py

Conversation

mvansegbroeck commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pre-Review Checklist

Pre-Merge Checklist

Uh oh!

binaryaaron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

binaryaaron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

binaryaaron left a comment

Choose a reason for hiding this comment

Uh oh!

binaryaaron commented Mar 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mvansegbroeck commented Mar 19, 2026 •

edited

Loading